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A high degree of rellahility must be obtained from modern 
electronic equipment if its uce is to "b© extended in th© iUtiir©o Be* 
ceuse of their size and 'becaai?© of the presence of a memory which re- 
members incorrect as well as correct iftforjnation, reliability of elec- 
tronic digital computers is an even more difficult problem than for more 
cominon equipment • 

Since most component failures in conservatively designed cir- 
cuits result from gradual deterioration, checks of performance margiiis 
per&it removal of failing components before operation failure cccurso 
^ submitting circuits to strained operating conditions such as de- 
creasizjg ccreen-grid voltage for amplifiers, the condition of vacuum 
tubes and other coinponents are checked in place. This prevelitivs main* 
tenance is called marginal checking. The amount of additional eq.uip=- 
ment needed for detection and signal source switching depends upon th© 
degree of reliability required. In electronic cosputers detection 
and source Sid. tching can be don© with the proper test progrem of cosjputer 
instructions; so that the procedure may be hi^ly automatized without 
excessive addational equipment « 

Simply varying power supply voltages is inadequate because 
particular deteriorated components are not isolated* However, group- 
ing of circuits into sections not used simultaneously gives good iscla-- 
tion« 

Preventive laaintea^jic® in the form of marginal checking 
applied to a ^0 tube system isiproved reliability over 50 to 1« Pre- 
liminary results indicate that equally good results can be obtained 
with larger systems. 
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A high degree of rellaMlity must "be obtained from modern elec- 
tronic systems for tvo obvious reasons. The investment in the eauipment 
itself may be quite high so that the user justifiably expects dependable 
operation over a long period*. On the other hand, large sums of money 
or even human life may be staked upon proper operation at specific timeeo 
An example of the latter may occur in the future when an electronic 
computer directs an aircraft as it ap^^roaches an airport for landing* 
The computer in this case receives location, speed, and altitude informa^- 
tion perhaps from a radar set, checks this information against that of 
all other planes in the vicinity, and directs the pilot hov to proceed 
all in a small fraction of a second. Reliability is of utmost impor- 
tance in such a system^. Xt is not sufficient to keep errors below a 
certain minia!uri« It is necessary to prevent their occurrsnce« 

In considering the Tjroblem of reliability for electronic equip- 
ment, this parer will describe a technique called marginal checking* 
As a specific exasiple of this technique, methods and results obtained 
in electronic digital computer research at the Servomechanisms Laboratory, 
Massachusetts Institute of Technology will be given<. 

Beliabillty in computers is an even greater problem than in 
more common electronic eyptsos.* One wrong letter does not void a 
teletype message, ignition noise does not completely void television, 
nor does an arcing magnetron zaallify the plot on a radar 8creen« Per- 
formance is still considered satisfactory if these occur only infre- 
quently. With digital computers, however, a single disturbance can 
invalidate the Mhola effort. This is due to the high concentration of 
information and to the presence of a memory within the computeri The 
memory remembers incorrect information as Veil as the correct, and once 
an error finds its way into the memory, it can propagate itself into 
all subsequent operation. 

Also the size of digital computers adds to the reliability 
problem. Most of the large scale digital machines undeJr development 
use many thousands of vacuum tubes, crystal rectifiers, resistors, 
capacitors, and inductorso Vacuum tubes and crystal rectifiers are 
the most unreliable of these, but operation failures due to other 
component?* may be e^qpected because there are so many* A typical 
computer may have 5000 vacuum tube cathodes and 10,000 crystal rec- 
ti flers^ Assuming an average life of tubes of 5000 hours and for 
crystal rectifiers lOjOOO hours, a failure may be statistically expect.ed 
every 30 minutes from these aging components* Even if trouble-location 
Ifi itQll developed/ so that repair time is short, operating efficiency 



^fould Iba very lov/o A natural qiieetion is if periodic replacemeatQ of 
certain componeEits would improve offlciencyo Unfortunatoly, early 
failure In groups of new taiTiftS 1b quit© high; so that wholesale r©-* 
placamsat on a tin® basis lai^t even increase the failure rate. 

Tha picture is indeed dark, unless a very important fact is 
recognized. That is that EOat component failures in conservatively d®-^ 
signed circaitg reesult from a gradual change in their characteristics^ 
This is the hasl» of marginal checking. If a circuit containing a com- 
ponent whose deterioration I9 not sufficient to cause trouhl® in normal 
operation is subjected to an abnormal strain, faulty operation will re- 
cult « The amount of strain necessary to cause failure is called the 
operating margin. Marginal checking applies this strain and observes 
the result in a routine maintenance period. Removal of components causixig 
low margins, insures a predictable life expectancy of all other components„ 
This panocedur© is somewhat analogous to the comis^on insulation^break- 
down teste- However, marginal checking produces no damage to components 
and is applied by built-in facilities. 

In designing a marginal checking system, each circuit must b© 
examined to see how a strain may be applied. Imrlnent failures must 
be converted to real failures during the maintenance period by action 
from outside the circuit- Many possibilities exist such as changii:^ 
the character of the input signal, changing a supply voltage, or changing 
of output loading. Some examples of how marginal checking is applied 
to computer circuits are given below» 

Figure 1 gives a typical basic block diagram often encountered 
in computers and other pulse systems. Gate tubs A, when open, allows 
pulses to pass elong a channel to a fllp«flop« If the pulses are large 
enough and the flip-flop in proper condition, each pulse will cau?;© a 
i'ev6rsal of the flip-flop from a 1 to or vie© versa. 

n?wo sorts of trouble may develop: first, a component of the 
gate circuit may deteriorate causing the pulse amplitude to reduce to a 
point where the fllp«flop will not switch or, secondj the flip«flop may 
refuse to switch because one of its components has deteriorated^ 

Gat© tube B is controlled by the flip-flop* ^h© application of 
a sensing pulse at B permits checking to see if the flip-flop has received 
end properly acted upon puleos from A, The operating margins of these 
circuits may be measured hy voltage variation as shown la the following 
paragraphs 

The gate circuit shown in Figure 2 is a video amplifier which 
can be switched on and off by its #3 gyid« The margin of performance 
in the gate tube can b@ checked by lowering the voltage on th® ecreea 
of the tube* As shown in the figure this is dons "by inserting a negative 
voltage in series idth the screen grid lead« Under these condition® th© 
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pulseB ©aorglng from the tabi?) i/zill "be lower thsu thoy were "befor© the 
deviation. ^Cliifs effect makes the tu"bo look i/eaker* 

Figur® 3 is a siinpllfied schcstatic of a fllp-flop» A flip-flop 
is a circuit haviog two etable states, each etat© "being deterRiined "by 
W'-i^h of its two tiilsRs is conducting. The circuit is flym©trical« 



One tiib© inuGt have the ahility, when oonductir^, to hold 
other tube in a non-condiaoting states Tuhe deterioration shows up as 
a reduction in plate current in one tub© with a consequent reduction of 
Mas available to the opposite tube. The use of a cathode resistor 
allows considerable, aging before the condition becomes intolerable but 
evemtually tube deterioration will become so extreme that instability ' 
will occur and the flip^-flop will favor one side. Then, whenever it is 
ordered to change sides by an incoming; pulse the circuit will either fail 
to hold its new position after switching talces place. 

This unfavorable condition can b© detected before it laade to 
failure by feeding the two screen circuits of the flip-flop separately, 
as shown, and selectively raising the screen voltage of the normally 
off tube. Raising its screen voltage also raises its no* 1 grid cut- 
off voltage. The normally on tube must have a safe margin of plate 
current available if it is able to hold the tube being checked off 
under those extreme conditions. If the on tube is weafe it \dll fail 
to hold off the opposite tube and a spurious switching operation will 
result* The detection of tM s condition can be automatic by applying 
sensing pulses to a gate tub« attached to the flip-flop as in Figure 1« 

Time limits discussion to these two conversion methods « Others 
are equally effective. Clamping crystals have been checked by changing 
the timing eequence« Transformers and line terminations have been, 
checked by varying the frequency of pulse sequence's » 

In applying voltage vcriation as a means of marginal checking, 
it is uneconomical to provide separate variation facilities for each 
circuit. Moreover, simply varying power sup-nly voltages is not enough. 
This would allow checking overall op<^rating margins but would be of small 
value because little informetion as to ifhat components cause low margins 
is obtained thereby. An economical compromise may be obtained by grouping 
similar circuits of separate channels into flections. If checking of the 
channels is then don® in time sequence, effective isolation of deterior^^ 
ated eottponentft is obtained* 

Figure ^ shc5WB how a computer may be sectioned for marginal 
checking* Three of many channels are shown. The vertical lines Indicate 
the aectionis into which are grtuped similar circuits of different channels* 



It. ^. 



Ae voltage variation Is applied to each section, the pulse sourco send® 
signals through each clmnnel in time sequence and at the sawe times, 
throu^ th© checking ch^ijui©! to the checking section or detector* 
Failure to receive the proper signal at the detector cajipes the whole 
eccjucneo to etop and an alano to sound* The channel end section coor- 
dinates of the faulty stt^ge are indicated "by the stopping point of tba 
sequenceo Thus a high degreo of isolation is obtainedo I'erglnal check- 
ing of the pul8« source, the ddtector, and the checking channel imist be 
done separate from the other ch-annels, but the same philosophy of group- 
ing can b© applied* 

In electronic computers the marginal chf«cking routine can be 
antomatized to a great e:ctent. Switching of the pul^e soturce and the 
detector from channel to channel is kccomplished by a set of computer 
instructions in th© form of a specially prepared test programo The 
only extra equipment needed is for switching voltage variation facilities 
from section to 8eetion« 

In the Whirlwind computer system some POO sections are used^ 
With the proper computer pr6gram> pttle«s are sent through each channel 
in a fraction of a second* The marginal checking control panel is 
shown in figure 3* A register of indicator lights on the left shows 
the section under test^ The telephone dial in the center Is used for 
ffiianually selecting a given section* The voltmeter at the right indicates 
the voltage excursion applied. In manual operationi the dial beneath 
the voltmeter controls an ampli dyne generator which provides the voltage 
excursion. In automatic operation, telephone* type stepping switches 
shown in Figure 6 select the various sections in sequence. The checking 
time for each section is 5 seconds; so that the vhole system may be 
checked in about 15 minutes* 

The potentiality of marginal checking is shown from its per« 
formance record* Consider the record of a 5-T^inary-diglt prototype 
arithmetic element, containing about Uoo vacuum tubes* It is set up to 
solve a test problem and checks the result continuously Sk hours a day. 
Marginal checking is performed daily during a l/P. hour preventive main- 
tenenoe period and deteriorated components replaced. Several runs of 
three weeks without computational error have been made« A recent period 
of ^5 days contained no errors* This represents over 5 billion correct 
solutions, and about lO-''^ correct flip-flop reversals in 25 flip-flop 
circuits. IXiring this time, l6 tubes, 7 crystals, and k resistors were 
replaced during marginal Checking periods because of low margins* 

It is expected that for larger systems ©quipped with marginal 
checking, errors %fill not increase in proportion to the extra equipment 
involved* In the equipment just described, a high percentage of errors 
are caused by power failure, thunderstorms, B^d. other external distur- 
bances independent of the number of vacuum tubes used* This conclusion 
is verified by experience with the Whirlwind Computer now under test 
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(ii?.ring Ite installaticno The folloifing table Qho^m tu'ba and crjst.sl 
recti f lor failuros* 

TUBF. AWO CRYST^ PAILimES 
(2750* Hours of Operation) 



3500 


Crystals 
9500 


125 


176 


7S 


iJis 



Hiiin'ber in TSqq 

Total Failures From all Cause b 

Located by Marginal Checking 



Notej 2750* Hours for majority-minority of tutes were 
installed later. 
Figures to Karch 31, 1950- 

Even though adequate marginal checking facilities were not 
available for some of the period, of 12S tube failures, 7S or 6l^ were 
detected by marginal checldng before they caused operational failure*. 
Many of the remaining failures were from obvious causes such as mechanical 
failure or extreiae gasdness causing blown fuses* Of 176 crystal rec- 
tifier failures, 8^^ or ikS were removed during preventive maintenance* 

Marginal checking as described does not eliminate the inevitable 
intermittent faults as such. However, many so-called intermittent s are 
actually caused by deterioration Just to the point where obscure and 
minute external disturbances cause failure sometimes and sometimes note 
These faults are uncovered by marginal checking. Moreover, actual 
intermittent faults such as broken welds in vacuum tubes and poorly 
soldered joints can be attached more directly and with more assurance if 
the condition of bther components has been establishod by marginal 
checking, . 

In Bummary, marginal checking is aimed at the detection of aging 
components before thej^ cause operational failure. Results have already 
shown that the concept of built-in preventive maintenance equipment is 
sound. The method Of application to specific electronic devices depends 
upon the case at hand. The degree of elaborateness of the additional 
equipment is determined by the degree of reliability required* 
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Pig. 1. 
Typical pulse circuit 




Fig. 2. 
Marginal checking of gate circuit 
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Pie. 3. 

Marginal checking of flip-flop circuit 



Pig. h. 
Computer marginal checking 




Fig* 5« Marginal checking control panel 




Fig, 6. Marginal checking relays 



